4,131 research outputs found

    Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

    Full text link
    In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e.g., CLIP) by adapting them to the video domain. A critical problem for them is how to effectively capture the rich semantics inside the video using the image encoder of CLIP. To tackle this, state-of-the-art methods adopt complex cross-modal modeling techniques to fuse the text information into video frame representations, which, however, incurs severe efficiency issues in large-scale retrieval systems as the video representations must be recomputed online for every text query. In this paper, we discard this problematic cross-modal fusion process and aim to learn semantically-enhanced representations purely from the video, so that the video representations can be computed offline and reused for different texts. Concretely, we first introduce a spatial-temporal "Prompt Cube" into the CLIP image encoder and iteratively switch it within the encoder layers to efficiently incorporate the global video semantics into frame representations. We then propose to apply an auxiliary video captioning objective to train the frame representations, which facilitates the learning of detailed video semantics by providing fine-grained guidance in the semantic space. With a naive temporal fusion strategy (i.e., mean-pooling) on the enhanced frame representations, we obtain state-of-the-art performances on three benchmark datasets, i.e., MSR-VTT, MSVD, and LSMDC.Comment: to be appeared in ICCV202

    Identity-Consistent Aggregation for Video Object Detection

    Full text link
    In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame. Existing methods treat the temporal contexts obtained from different objects indiscriminately and ignore their different identities. While intuitively, aggregating local views of the same object in different frames may facilitate a better understanding of the object. Thus, in this paper, we aim to enable the model to focus on the identity-consistent temporal contexts of each object to obtain more comprehensive object representations and handle the rapid object appearance variations such as occlusion, motion blur, etc. However, realizing this goal on top of existing VID models faces low-efficiency problems due to their redundant region proposals and nonparallel frame-wise prediction manner. To aid this, we propose ClipVID, a VID model equipped with Identity-Consistent Aggregation (ICA) layers specifically designed for mining fine-grained and identity-consistent temporal contexts. It effectively reduces the redundancies through the set prediction strategy, making the ICA layers very efficient and further allowing us to design an architecture that makes parallel clip-wise predictions for the whole video clip. Extensive experimental results demonstrate the superiority of our method: a state-of-the-art (SOTA) performance (84.7% mAP) on the ImageNet VID dataset while running at a speed about 7x faster (39.3 fps) than previous SOTAs.Comment: to be appeared at ICCV202

    Minimizing the number of edges in Ks,tK_{s,t}-saturated bipartite graphs

    Full text link
    This paper considers an edge minimization problem in saturated bipartite graphs. An nn by nn bipartite graph GG is HH-saturated if GG does not contain a subgraph isomorphic to HH but adding any missing edge to GG creates a copy of HH. More than half a century ago, Wessel and Bollob\'as independently solved the problem of minimizing the number of edges in K(s,t)K_{(s,t)}-saturated graphs, where K(s,t)K_{(s,t)} is the `ordered' complete bipartite graph with ss vertices from the first color class and tt from the second. However, the very natural `unordered' analogue of this problem was considered only half a decade ago by Moshkovitz and Shapira. When s=ts=t, it can be easily checked that the unordered variant is exactly the same as the ordered case. Later, Gan, Kor\'andi, and Sudakov gave an asymptotically tight bound on the minimum number of edges in Ks,tK_{s,t}-saturated nn by nn bipartite graphs, which is only smaller than the conjecture of Moshkovitz and Shapira by an additive constant. In this paper, we confirm their conjecture for s=t−1s=t-1 with the classification of the extremal graphs. We also improve the estimates of Gan, Kor\'andi, and Sudakov for general ss and tt, and for all sufficiently large nn.Comment: Reflected minor suggestions from reviewer

    Vertex Downgrading to Minimize Connectivity

    Get PDF
    We consider the problem of interdicting a directed graph by deleting nodes with the goal of minimizing the local edge connectivity of the remaining graph from a given source to a sink. We introduce and study a general downgrading variant of the interdiction problem where the capacity of an arc is a function of the subset of its endpoints that are downgraded, and the goal is to minimize the downgraded capacity of a minimum source-sink cut subject to a node downgrading budget. This models the case when both ends of an arc must be downgraded to remove it, for example. For this generalization, we provide a bicriteria (4,4)-approximation that downgrades nodes with total weight at most 4 times the budget and provides a solution where the downgraded connectivity from the source to the sink is at most 4 times that in an optimal solution. We accomplish this with an LP relaxation and rounding using a ball-growing algorithm based on the LP values. We further generalize the downgrading problem to one where each vertex can be downgraded to one of k levels, and the arc capacities are functions of the pairs of levels to which its ends are downgraded. We generalize our LP rounding to get a (4k,4k)-approximation for this case

    Plasma exosomal microRNAs are non-invasive biomarkers of moyamoya disease: A pilot study

    Get PDF
    Background: As a progressive cerebrovascular disease, Moyamoya Disease (MMD) is a common cause of stroke in children and adults. However, the early biomarkers and pathogenesis of MMD remain poorly understood. Methods and material: This study was conducted using plasma exosome samples from MMD patients. Next-generation high-throughput sequencing, real-time quantitative PCR, gene ontology analysis, and Kyoto Encyclopaedia of Genes and Genomes pathway analysis of ideal exosomal miRNAs that could be used as potential biomarkers of MMD were performed. The area under the Receiver Operating Characteristic (ROC) curve was used to evaluate the sensitivity and specificity of biomarkers for predicting events. Results: Exosomes were successfully isolated and miRNA-sequence analysis yielded 1,002 differentially expressed miRNAs. Functional analysis revealed that they were mainly enriched in axon guidance, regulation of the actin cytoskeleton and the MAPK signaling pathway. Furthermore, 10 miRNAs (miR-1306-5p, miR-196b-5p, miR-19a-3p, miR-22-3p, miR-320b, miR-34a-5p, miR-485-3p, miR-489-3p, miR-501-3p, and miR-487-3p) were found to be associated with the most sensitive and specific pathways for MMD prediction. Conclusions: Several plasma secretory miRNAs closely related to the development of MMD have been identified, which can be used as biomarkers of MMD and contribute to differentiating MMD from non-MMD patients before digital subtraction angiography

    Cyclically 5-Connected Graphs

    Get PDF
    Tutte's Four-Flow Conjecture states that every bridgeless, Petersen-free graph admits a nowhere-zero 4-flow. This hard conjecture has been open for over half a century with no significant progress in the first forty years. In the recent decades, Robertson, Thomas, Sanders and Seymour has proved the cubic version of this conjecture. Their strategy involved the study of the class of cyclically 5-connected cubic graphs. It turns out a minimum counterexample to the general Four-Flow Conjecture is also cyclically 5-connected. Motivated by this fact, we wish to find structural properties of this class in hopes of producing a list of minor-minimal cyclically 5-connected graphs

    5-Hydr­oxy-1-(3-hydr­oxy-2-naphtho­yl)-3,5-dimethyl-2-pyrazoline

    Get PDF
    In the title mol­ecule, C16H16N2O3, intra­molecular O—H⋯O hydrogen bonds influence the mol­ecular conformation. Inter­molecular O—H⋯O hydrogen bonds [O⋯O = 2.922 (2) Å] link the mol­ecules into centrosymmetric dimers. Weak inter­molecular C—H⋯O inter­actions assemble these dimers into layers parallel to the bc plane
    • …
    corecore